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COUNSELORS OFTEN ADMINISTER TESTS OF QUESTIONABLE 
VALIDITY. IN RELIABILITY STUDIES, EVERY PRECAUTION IS TAKEN 
TO STABILIZE THE STIMULUS SITUATION. IN ASSESSING VALIDITY, 
CONCERN CENTERS ON BEHAVIOR UNDER DIFFERENT STIMULUS 
CONDITIONS. CRONBACH’S THEORETICAL LIMIT FOR A VALIDITY 
COEFFICIENT OF A TEST IS THE SQUARE ROOT C'F THE RELIABILITY 
COEFFICIENT. GHISELLI, IN REVIEWING HUNDREDS OF VALIDITY 
STUDIES COMPLETED BETWEEN 1919 AND 1964, FOUND THAT NONE OF 
HIS FOUR MAJOR CLASSES OF APTITUDE TESTS FORECAST FROFICIENCY 
ON ANY JOB WITH A HIGH DEGREE OF ACCURACY. ALTHOUGH TESTS CAN 
HAVE A SUFFICIENTLY HIGH DEGREE OF PREDICTIVE POWER TO BE OF 
PRACTICAL VALUE IN PERSONNEL SELECTION, THEY MAY ALSO 
FRUSTRATE THE COUNSELOR'S CHANCES OF HELPING THE CLIENT SOLVE 
IMMEDIATE PROBLEMS. IN ACTUARIAL INTERPRETATION AND BEHAVIOR 
PREDICTION BASED ON TEST BATA, GREATER RELIANCE SHOULD BE 
PLACED ON MULTIVARIATE ANALYSIS. RIGOROUS COLLECTION, 

ANALYSIS, AND REPORTING OF PREDICTION AND CRITERION DATA ARE 
NECESSARY. USE OF GOLDMAN’S MULTIDIMENSIONAL APPROACH CAN 
HELP THE CLIENT BY DEFINING THE ESSENTIAL ELEMENTS OF THE 
PROBLEM. UNTIL MORE VALID TESTS ARE DEVELOPED, WE MUST CHOOSE 
BETWEEN REFUSAL TO USE TESTS, AND USING THEM AS PART OF AN 
EXTENSIVE DESCRIPTION. THIS SPEECH WAS PRESENTED AT THE 
AMERICAN PSYCHOLOGICAL ASSOCIATION CONVENTION, WASHINGTON, 
B.C., SEPTEMBER 2, 1967. (PR) 
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Counselors collaborate with clients to assist the* in 
resolving their immediate problems. These problems may involve 
vocational choice , educational decisions, social pressures, 
emotional stress, or, more likely, some mixture of stimuli for 
which no appropriate response pattern is spontaneously available 
to the client. The effective counselor, however, goes beyond 
this immediate objective and seeks to achieve the ultimate goal 
of counseling, aiding the client to acquire generalized problem* 
solving behavior . 



It makes little difference how we characterize the ultimate 
purpose of counseling. We may label this adjustment, enhancement 
of the phenomenal self, development of ego strength, acceptance of 
the existential condition, serenity, or what have you. What the 
conscientious and responsible counselor seeks to do is to aid hie 
client in acquiring a highly organized and highly energized system 
of psychological mechanisms that will permit the client to check 
conflict as or before it arises and to move forward toward meaningful 9 
satisfying, distant goals. We have chosen to call these psychologioel 
mechanisms generalized problem-solving behavior and the state in 
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'which they function spontaneously end efficiently, seren i t y 
(VfeitZ, 1964, p. 143-144). 

One of the principal consideration, in the solution of hum* 
problems is a recognition of and a willingness to accept reality 
as it is. Unfortunately, in the case of human interactions, 
reality is the product of behavior performed on the basis of 
perceptions and misperceptions of object, and events. Thus, re.llty 
i. seldom an unambiguous event. Counseling, however, i. designed 
to aid in the achievement of generalised problem-solving behevior 
by reducing the client's misperception, and clarifying some of the 
exiguities in his segment of reality. 

It is in this context that I should like to consider with you 
the use of teste in counseling. With few. exceptions, counseling 
psychologist, use test, in helping their client, identify their 

problem, and. find solution, to th.., although a. - «U *»<"• 
counselors of certain persuasion, do so reluctantly and some even 
truculently with th. consequent ill-effect, equaled only by th. 

-test and tell" school of counselor. If tests, then. er. widely 
used in counseling— and th. economic euphoria enjoyed by many tsst 
publisher, is evidence of this wide use— w. are confronted with the 
question of whether or not test score, can provide u. « basis for 
th. correction of client's mi.perosption. and . »*»• of clarifying 
of th. ambiguities of percsived reality. A lifetime of using 
tests in counseling has led some of us in our few moments 
vision— to suspect that testing in guidance may be a piao. of 
superstitious tribal ritual that permit, u. to accept our error-lad». 



predictions with less pain and anxiety. 
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Answers to the question of whether or not testing in likely 
to contribute to our client's acquisition of generalized problem- 
solving behavior may be found in an examination of the validities 
of some of the kinds of tests we use* 

The validity of a test, as we all knot* is the degree to which 
it measures what it purports to measure* Therm are, again as we ell 
know, many different kinds of test validity all more or less suited 
to the different purposes of different test users* In counseling # 
however, we are most concerned with predictive validity* When s 
counselor orders the administration of a test in the course of 
counseling, he wants to be able to say, "If the client exhibits a 

given level of a definable kind of behavior— as measured by the 

/ 

test— today, he is likely to esdiibit a similar level of the sans 
kind of behavior in the future under different, but similar, 

e 

circumstances » N 

Lat us look at it this ways A tast is, in essence, a sample 
of behavior* If we wish to make some estimate of a client's 

mathematical behavior, say, we confront him with e sample of 

\ • 

mathematical stimuli end observe the ways in which he makes hie 
responses* If our sample of mathematical stimuli is well chosen, 
it will represent the total sat of stimuli that have confronted or 
ere likely to confront the client end hence will evoke e variety of 
mathematical responses available to the client* When hie responses 
are compared with the responses of other examinees who have had a 
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similar opportunity to acquirt mathematical responses we can assist 
a number to his behavior that not only tells us something about hit 
present total mathematical behavior but also tells us something 
about his probable future behavior in this domain* This prediction 
is possible because of our experience with large numbers of obeor- 
vations of mathamatical behavior* 

When we are assessing the re li ability of a test, we try, 
insofar as possible, to maintain fairly constant stimulus situations 
Consider, for example the test-retest method of estimating 
reliability* Here wit try to reproduce the same test conditions end 
in many instances, at whan the retesting is dene with the same form 
of the test, the specific mathematical stimuli remain tha same* hi 
the case of alternate form retesting, the specific stimuli ere 
changed, but they are assumed to activate the tame behavioral 
responses* Thus in reliability studies, we take every precaution 
Xo stablize the stimulus element of the behavior product in order 
to estimate the effects of chanoe factors irrelevant to the 
restricted range of behavior being sampled* 

When we a.s assessing validity , however, we are concerned 
with degree to which the behavior under observation functions under 
different sets of stimulus conditions • Thus we may sample mathe- 
matical behavior with a test of arithmetic speed end accuracy in 
which the stimulus items are problems of addition, subtraction* 

e 

Mul tiplication, and division of thr** digit nuaboro, In order to 
Maaur* th* validity of thi* t**t wo May aampl* tha behavior of a 
group of booMco*p*r* and rat* th*M with r**p*ct to *p*od and 
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accuracy on their jobs over a period of time and then determine the 
degree of correlation between their test scores and their rating* 

The significant factor here is that a major element in the behavior 
product, the stimulus element , is markedly different in the test 
and criterion samples although both have the generally common factor 
of arithmetical stimuli* 

When we take two samples of bshevior in which svsry effu»*t is 
mads to insure similarity of both stimuli and responsas as in the 
case of reliability studies, wa never get a parfeot correspondence 
between tht .«o samples* Reliability coafficiants in axcass of *90 
are considered exceptional within restriotad rang# samples of 
subjects* Consequently, one would expect that samples of behavior 
in which a major element,. tha stimulus t is varied, would produoe 
even lower coefficients of validity* And, in fact, as we all know, 
they do. But how low? Cronbach <1960, p* 132) tells us that a 
validity coefficient can naver axoesd the square root of the 
reliability coefficiant of a test* Thus a test with a reliability 
of *90 can ba expected to have a validity coefficiant of *9$ or 
less with some external criterion* This is the theoretical limit , 
but how much lass do we find in aetual practioe in the esse of the 
teste ussd in counseling? 

* 

I should like to consider with you what can happen in the 
case of using tests in the area of vocational counseling* To be 
sure, this is not the only kind of problem confronting the counselor 
but we seem to know more about teste in this arse than we do in 
some others so that what we have to say about tha findings here earn 



/ 5 e applied in other areas provided due allowance is made for our 
more limited knowledge there, 

Ghiselli reviewed hundreds of validity studies, both published 
and unpublished, that were completed between 1919 and 196 4 , and 
summarized his findings in a fascinating if frightening little 
book entitled The Validity of Occupational Aptitude Tests (Ghiselli * 
1966), After finding that interest and personality tests contributed 
little to the prediction of • occupational' performance , he grouped 
the remaining tests into four major categorise : (1) intellectual 

abilities, (2) spatial and mechanical abilities, (3) perceptual 
accuracy, and (4) motor abilities# He summarized the validity 
coefficients for many different tests in each of these "aptitude* 
areas against two principal criterion measures for a wide variety 
of occupational groups# The criterion measures were training slid 

job proficiency# 

Ghistlli reports that: 

. . • none of the major classes of tests forecasts 
proficiency on any job with a high degree of accuracy# 
Although in a number of instances tests have moderate 
validity, their power to predict success on the actual 
job is substantially leas than their power to predict 
trainability# (Ghiselli, 1966, p# 64#) 

He points out that, "Taking all tests as a whole for training 
it will be observed • • • th^vt nearly half the avSrsgi 
validity coefficients, 47.3 per cent, are at least moderately. hi#* 
being #30 or greater# Indeed," he continues, "nearly three quartet* 

e 

of them, 72 #5 per cent* are above #20, which is perhaps, the lotfdr 

* 

limit of usefulness." (Ghiselli, 1966, p# 123.) 
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When we move to proficiency criteria , however, the picture 



darkens. Here Ghiselli (1966, p. 124) reports that for all tests 
only 14.7 per cent had validity coefficients of .30 or better while 
somewhat less than half, 44.5 per cent, had coefficients in excess 
of .20. 



Ghiselli draws the following conclusions about average validity 

coefficients s 1 

. . . while the general predictive power of aptitude 
tests in forecasting occupational success is by no 
means zero, it is far from impressive. For all tests 
and jobs as a Khole, a coefficient of the order of 
.30 describes the general validity of tests for 
training criteria, and one of the order of .20 gives 
the value for proficiency criteria. (Ghiselli, 1966 % 
p. 125.) 

It might be worth remembering, at this point, that if the 
validity coefficient is based on 100 cases, it needs to exceed 
.195 to be significantly different from zero at the .05 level of 
significance and to exceed .2 30 at the .01 level. Or to put it 
another way: A client whose test score places him in the upper 

quarter of his group has about one chance in four of being in the 

* 

upper quarter on the criterion measure and an equal chance of 
being in the lowest quarter when the validity coefficient is about 
.0. Even when the validity coefficient is as high as .40— and 
this is somewhat higher than <:he average for Ghiselli* s findings— 
the client* s chances of being in the upper quarter on the criterion 
measure when he scores in the upper quarter on the predictive test 

is considerably less than 50-50 ; it is 428 chances out of 1000. 

* 

One is inclined to paraphrase the old saying as **with validity 
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coefficients like this who needs enemies?" 



3ut , of course, these are average validity coefficient* 

% 

reported by Ghiselli, and we all know that counselors use only the 
best tests. What does Ghiselli have to say cn this score? Just 
this : 

The highest validities found in any of the single 
studies reviewed for this summary, studies in which 
more than one hundred workers were used, were *77 
for training criteria and .66 for proficiency 
criteria. In both of these investigations the 
coefficients were for intelligence tests applied 
in the trades and crafts* (Ghiselli, 1966, p* 126*) 

With a correlation coefficient of *77 the coefficient of 
determination is *59 and with a correlation of *66 the coefficient 
of determination is .44* (See Croxton and Crowden, 1943, pp. $ 63 - 
664.) This suggests that in the former case slightly over half the 
variability in the criterion is accounted for by variability in the 
predictor and in the case of the validity coefficient of *66 only 
44 per cent of the variability in job proficiency is accounted for 
in the predictor measure* 



Ghiselli' s parting comment needs to be considered by couneelore 
who plan to use tests: 

It is apparent that even the most optimistic 
supporter of tests cannot claim that they predict 
occupational success with what might be termed a 
high degree of accuracy* Nevertheless, in most 
situations tests can have a sufficiently high 
degree of predictive power to be of considerable 
practical value in the selection of personnel* 

(Ghiselli, 1966, p. 127.) 
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If, at we indicate# earlier, counseling it intended to heilp 
the client correct his misperceptions and reduce aifciguity in 
reality,, and if, as Ghiselli suggests, tests are likely to provide 
us with only a limited and often distorted view of the events, we 
wish to observe, then we may be inclined to conclude that testing 

e % 

in counseling may not only frustrate our chances of assisting the 
client in solving his immediate problem, but may in fact, engender 
in the client a self-defeating approach to generalised problem- 
solving, for the inclusion of highly ambiguous data in the problem- 
solving process may, and I fear, often does, acclimatise the client 
to a toleration of superstition and ritual in the solution of problems 
and teach him irrationality in problem-solving which, to sucoeed, 
needs to becoms a highly .logical and rational process* 

All is not lost, however, despite the obvious limitations 
plaoed on tests by their apparently low validities* Goldman has 
•suggested a model for the dimensions of interpretation in counseling 
(Goldman, 1961, p. 143ff*) Three dimensions are suggested (1) type . 
of data (including test and non- test data), (2) type of treatment 
of data ( including actuarial and clinical), and (3) type of 
interpretation (including descriptive, genetic, predictive, .and 
evaluative). It is possible in this three- dismns ion al format to 
generate sixteen kinds of interpretation of events which play a 
part in the client's problem and its solution* About half of 
these relate to test data* About half of the test interpretations— 
according to the model— are actuarial (see Meehl, 1954) in which 
test validity plays an important role*. Thus according to the 
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Goldman model only about one-quarter of the kinds of interpretations 
generated in the couree of problem-solving in counseling are depended, 
upon test validity. (The validity of other types of interpretation 
need also to be questioned, but thie ie not our preeent concern.) 

This state of affairs may help to lead us out of our morass 
of ambiguity. In the case of actuarial interpretation and prediction 
of behavior based on test data we need to place greater relienoe 
on multivariate analysis than vs have done heretofore. (See, for 
example : Rulon, et al. , 1967.) Devices and programs for their 

use are now available. What ie required now i. the rigorous collection 
of prediction and criterion date, all sorts of data, on a wide seale, 
and the analysis and reporting of it in a form that ie not only 
useful to the counselor, but also communicable to the client. Such 
an approach appears to give promise of moving test data away from 
their present status as a piece of superstitious tribal ritual end 
nearer to the role of the unambiguous picture of reality so desperately 
needed in the solution of human problems. 

Another approach ie the applicetion of what I have called 
description by extension (Weit*, 1964, pp. 64-6S). Here the other 
three-quarters of the Goldman model come into play. If an event 
is described in sufficient detail, the inconsistent elements will 
begin to emerge. Thus when we find e student whose level of gencttf 
test measured ability is high, whose test measured ebilities and 
interests would seem to make him admirably suited to a career in 
medicine, but who appear, to be failing the first chemist^ course • 
in a pre medical curriculum, we do not throw out the test scores cm 
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th. basis of their low validity in thi. instios nor do we urgs 
the client to -keep at it, do your baat, becauae the testa show that 



you can be cone a successful and happy surgeon, if you'll only try. 
Ihe inconsistency of the data hero suggests that ua nay be t*y*a* 






to solve the wrong problsn or, at least that wa are attacking 
problei in the wrong order, rurther extensive daecription of the , 
oUenfs behavior including additional, but different, test data, , 

«y help us rsdsfine our prqble. in .a way that pe«tts a rational 

* 



and realistic solution. 



With such i approach we are .or. likely to help our ellant 
reeolve his isnsdiate problea and at the seas tins assist hia to 
aoquire one of the flit essentials in generalised problen-eolving 
brt.vior, defining th. ptoble. in all of it. e.Mnti.l elects id 
avoiding quick solution, to poorly defined problea. that any, in 
fact, not exist. Until such til as tests of ability id invantorU 
of personality traits id interests are developed that have validity 
coefficient, sufficiitly high <.#° <* better) for th. prediction 
ot individual perforaioe against cl.ily speoifi.d id relsvit 
criteria, we are faced with the alternative of refusing to is testa 
ia counseling at all or using then as part of i extensive desorlpwl 
includi ng aulti variate analysis. Of course , we ci always go i 
d oing what we have been doing for my years- using tests *• » 
ritual for placating the gods of Chios. 
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